Search CORE

6 research outputs found

Contract-Based Programming on Modern C++

Author: Cabrero Holgueras José
Publication venue
Publication date: 13/07/2018
Field of study

Contract-based Programming or Design By Contract (DBC) is a discipline for system construction that in recent years has postulated to be one of the most solid and reliable models for software creation. It is well known that in the software industry the number of projects not being successfully developed is huge. The main cause of the failure is that projects do not meet user expectations. In this context, Design By Contract seems to emerge as a solution to decrease this failure rate. This philosophy provides a set of mechanisms for the validation of part of the requirements specification. In recent years, several programming languages started to implement DBC, either as part of the language or an external library. The main programming languages that support contract-based programming are Ada 2012, Spark, Eiffel, D, C# CodeContracts or Microsoft Source-Code Annotation Language (Microsoft SAL). Traditionally, C++ has been a programming language focused on flexibility, performance and efficiency. This has attracted many people to carry out projects using this programming language. However, trends make programming languages change, and the interests of the industry are leaning towards solid solutions. Those solutions shall include frameworks that are reliable. With this same purpose, C++ has designed a specification for the implementation of Design By Contract in the programming language. This new specification has been accepted by the ISO C++ committee to be included in C++20. The specification includes several clauses that allow the user to write pre/postconditions on the code. This allows part of the requirement specification to be merged into the code, enabling traceability between the phases of the software project. The specification of a new feature in a programming language implies changes in how the language is understood by a compiler. For the implementation of a new specification, several changes are required at different levels. This document describes these changes. Additionally, it provides an overview of the structure of a compiler, and a brief description of all the parts of the Clang C++ compiler.La programación por contratos es una disciplina de construcción de sistemas que recientemente se ha postulado como una de las más solidas y fiables para la creación de sistemas software. Se sabe que la industria de desarrollo de software no está siendo exitosa debido en parte a la tasa de fallos que hay en éstos. En este contexto, la programación por contratos emerge como una solución para reducir esta tasa de fracaso en la industria. Esta tendencia de desarrollo provee a los usuarios con mecanismos para la validación de los requisitos. En los últimos años, varios lenguajes de programación han comenzado a implementar la programación por contratos, bien como parte del lenguaje o como una biblioteca externa. Los principales lenguajes de programación que a día de hoy soportan programación por contratos son Ada 2012, Spark, Eiffel, D, C# CodeContracts or Microsoft Source-Code Annotation Language (Microsoft SAL). Tradicionalmente, C++ ha sido un lenguage de programación centrado en proveer al usuario con flexibilidad, rendimiento y eficiencia. Estas características han atraido a muchos clientes de cara a utilizar este lenguaje de programación en proyectos. Sin embargo, las tendencías fuerzan cambios en los lenguajes de programación, y los intereses de las empresas actualmente se están inclinando hacia soluciones robustas. Estas soluciones, deben incluir marcos de trabajo que sean fiables. Con esto en mente, se ha diseñado una especificación para la programación por contratos en el lenguaje de programación. Esta nueva especificación, ha sido aceptada para por el Comite ISO C++ para ser incluida en C++20. Esta especificación provee al usuario con varios mecanismos que permiten verificar condiciones en el código. Esto permite directamente enlazar la especificación de requisitos con la implementación de los mismos. La especificación de una nueva característica dentro de un lenguaje de programación implica cambios en como el lenguaje es entendido por un compilador. Para la implementación de estos nuevos requisitos se requiere de realizar modificaciones en el compilador en distintos niveles de análisis. En este proyecto, se describe un resumen de los cambios que son necesarios dentro de un compilador. Estos cambios incluyen un resumen de la estructura del compilador, posteriormente se desglosa la estructura del compilador de C++ Clang y por último se describen las modificaciones en cada una de las partes involucradas.Ingeniería Informátic

Universidad Carlos III de Madrid e-Archivo

Towards Improved Homomorphic Encryption for Privacy-Preserving Deep Learning

Author: Cabrero Holgueras José
Publication venue
Publication date: 27/06/2023
Field of study

Mención Internacional en el título de doctorDeep Learning (DL) has supposed a remarkable transformation for many fields, heralded by some as a new technological revolution. The advent of large scale models has increased the demands for data and computing platforms, for which cloud computing has become the go-to solution. However, the permeability of DL and cloud computing are reduced in privacy-enforcing areas that deal with sensitive data. These areas imperatively call for privacy-enhancing technologies that enable responsible, ethical, and privacy-compliant use of data in potentially hostile environments. To this end, the cryptography community has addressed these concerns with what is known as Privacy-Preserving Computation Techniques (PPCTs), a set of tools that enable privacy-enhancing protocols where cleartext access to information is no longer tenable. Of these techniques, Homomorphic Encryption (HE) stands out for its ability to perform operations over encrypted data without compromising data confidentiality or privacy. However, despite its promise, HE is still a relatively nascent solution with efficiency and usability limitations. Improving the efficiency of HE has been a longstanding challenge in the field of cryptography, and with improvements, the complexity of the techniques has increased, especially for non-experts. In this thesis, we address the problem of the complexity of HE when applied to DL. We begin by systematizing existing knowledge in the field through an in-depth analysis of state-of-the-art for privacy-preserving deep learning, identifying key trends, research gaps, and issues associated with current approaches. One such identified gap lies in the necessity for using vectorized algorithms with Packed Homomorphic Encryption (PaHE), a state-of-the-art technique to reduce the overhead of HE in complex areas. This thesis comprehensively analyzes existing algorithms and proposes new ones for using DL with PaHE, presenting a formal analysis and usage guidelines for their implementation. Parameter selection of HE schemes is another recurring challenge in the literature, given that it plays a critical role in determining not only the security of the instantiation but also the precision, performance, and degree of security of the scheme. To address this challenge, this thesis proposes a novel system combining fuzzy logic with linear programming tasks to produce secure parametrizations based on high-level user input arguments without requiring low-level knowledge of the underlying primitives. Finally, this thesis describes HEFactory, a symbolic execution compiler designed to streamline the process of producing HE code and integrating it with Python. HEFactory implements the previous proposals presented in this thesis in an easy-to-use tool. It provides a unique architecture that layers the challenges associated with HE and produces simplified operations interpretable by low-level HE libraries. HEFactory significantly reduces the overall complexity to code DL applications using HE, resulting in an 80% length reduction from expert-written code while maintaining equivalent accuracy and efficiency.El aprendizaje profundo ha supuesto una notable transformación para muchos campos que algunos han calificado como una nueva revolución tecnológica. La aparición de modelos masivos ha aumentado la demanda de datos y plataformas informáticas, para lo cual, la computación en la nube se ha convertido en la solución a la que recurrir. Sin embargo, la permeabilidad del aprendizaje profundo y la computación en la nube se reduce en los ámbitos de la privacidad que manejan con datos sensibles. Estas áreas exigen imperativamente el uso de tecnologías de mejora de la privacidad que permitan un uso responsable, ético y respetuoso con la privacidad de los datos en entornos potencialmente hostiles. Con este fin, la comunidad criptográfica ha abordado estas preocupaciones con las denominadas técnicas de la preservación de la privacidad en el cómputo, un conjunto de herramientas que permiten protocolos de mejora de la privacidad donde el acceso a la información en texto claro ya no es sostenible. Entre estas técnicas, el cifrado homomórfico destaca por su capacidad para realizar operaciones sobre datos cifrados sin comprometer la confidencialidad o privacidad de la información. Sin embargo, a pesar de lo prometedor de esta técnica, sigue siendo una solución relativamente incipiente con limitaciones de eficiencia y usabilidad. La mejora de la eficiencia del cifrado homomórfico en la criptografía ha sido todo un reto, y, con las mejoras, la complejidad de las técnicas ha aumentado, especialmente para los usuarios no expertos. En esta tesis, abordamos el problema de la complejidad del cifrado homomórfico cuando se aplica al aprendizaje profundo. Comenzamos sistematizando el conocimiento existente en el campo a través de un análisis exhaustivo del estado del arte para el aprendizaje profundo que preserva la privacidad, identificando las tendencias clave, las lagunas de investigación y los problemas asociados con los enfoques actuales. Una de las lagunas identificadas radica en el uso de algoritmos vectorizados con cifrado homomórfico empaquetado, que es una técnica del estado del arte que reduce el coste del cifrado homomórfico en áreas complejas. Esta tesis analiza exhaustivamente los algoritmos existentes y propone nuevos algoritmos para el uso de aprendizaje profundo utilizando cifrado homomórfico empaquetado, presentando un análisis formal y unas pautas de uso para su implementación. La selección de parámetros de los esquemas del cifrado homomórfico es otro reto recurrente en la literatura, dado que juega un papel crítico a la hora de determinar no sólo la seguridad de la instanciación, sino también la precisión, el rendimiento y el grado de seguridad del esquema. Para abordar este reto, esta tesis propone un sistema innovador que combina la lógica difusa con tareas de programación lineal para producir parametrizaciones seguras basadas en argumentos de entrada de alto nivel sin requerir conocimientos de bajo nivel de las primitivas subyacentes. Por último, esta tesis propone HEFactory, un compilador de ejecución simbólica diseñado para agilizar el proceso de producción de código de cifrado homomórfico e integrarlo con Python. HEFactory es la culminación de las propuestas presentadas en esta tesis, proporcionando una arquitectura única que estratifica los retos asociados con el cifrado homomórfico, produciendo operaciones simplificadas que pueden ser interpretadas por bibliotecas de bajo nivel. Este enfoque permite a HEFactory reducir significativamente la longitud total del código, lo que supone una reducción del 80% en la complejidad de programación de aplicaciones de aprendizaje profundo que usan cifrado homomórfico en comparación con el código escrito por expertos, manteniendo una precisión equivalente.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidenta: María Isabel González Vasco.- Secretario: David Arroyo Guardeño.- Vocal: Antonis Michala

Universidad Carlos III de Madrid e-Archivo

A methodology for large-scale identification of related accounts in underground forums

Author: Cabrero Holgueras José
Pastrana Portillo Sergio
Publication venue: 'Elsevier BV'
Publication date: 01/12/2021
Field of study

Underground forums allow users to interact with communities focused on illicit activities. They serve as an entry point for actors interested in deviant and criminal topics. Due to the pseudo-anonymity provided, they have become improvised marketplaces for trading illegal products and services, including those used to conduct cyberattacks. Thus, these forums are an important data source for threat intelligence analysts and law enforcement. The use of multiple accounts is forbidden in most forums since these are mostly used for malicious purposes. Still, this is a common practice. Being able to identify an actor or gang behind multiple accounts allows for proper attribution in online investigations, and also to design intervention mechanisms for illegal activities. Existing solutions for multi-account detection either require ground truth data to conduct supervised classification or use manual approaches. In this work, we propose a methodology for the large-scale identification of related accounts in underground forums. These accounts are similar according to the distinctive content posted, and thus are likely to belong to the same actor or group. The methodology applies to various domains and leverages distinctive artefacts and personal information left online by the users. We provide experimental results on a large dataset comprising more than 1.1M user accounts from 15 different forums. We show how this methodology, combined with existing approaches commonly used in social media forensics, can assist with and improve online investigations.This work was partially supported by CERN openlab, the CERN Doctoral Student Programme, the Spanish grants ODIO (PID2019-111429RB-C21 and PID2019-111429RB) and the Region of Madrid grant CYNAMON-CM (P2018/TCS-4566), co-financed by European Structural Funds ESF and FEDER, and Excellence Program EPUC3M1

Universidad Carlos III de Madrid e-Archivo

HEFactory: A symbolic execution compiler for privacy-preserving Deep Learning with Homomorphic Encryption

Author: José Cabrero-Holgueras
Sergio Pastrana
Publication venue: 'Elsevier BV'
Publication date: 01/05/2023
Field of study

Homomorphic Encryption (HE) allows computing operations on encrypted data, and it is a potential solution to enable Deep Learning (DL) in privacy-enforcing scenarios (e.g., sending private data to cloud services). However, HE remains a complex technology with multiple challenges that prevent successful application by non-experts. In this work, we present HEFactory, a program compiler that effectively assists in building HE applications in Python for both general-purpose and Deep Learning applications, focusing on non-expert data scientists. HEFactory relies on a layered architecture that deals with challenges such as automatic parameter selection and specific data representation of HE applications. Our benchmarks show that HEFactory substantially lowers the programming complexity (i.e., a reduction of 80% in the number of lines of code) with negligible performance overhead over programs written by experts using native HE frameworks

Directory of Open Access Journals

Towards automated homomorphic encryption parameter selection with fuzzy logic and linear programming

Author: Cabrero-Holgueras José
Pastrana Sergio
Publication venue
Publication date: 17/02/2023
Field of study

Homomorphic Encryption (HE) is a set of powerful properties of certain cryptosystems that allow for privacy-preserving operation over the encrypted text. Still, HE is not widespread due to limitations in terms of efficiency and usability. Among the challenges of HE, scheme parametrization (i.e., the selection of appropriate parameters within the algorithms) is a relevant multi-faced problem. First, the parametrization needs to comply with a set of properties to guarantee the security of the underlying scheme. Second, parametrization requires a deep understanding of the low-level primitives since the parameters have a confronting impact on the precision, performance, and security of the scheme. Finally, the circuit to be executed influences, and it is influenced by, the parametrization. Thus, there is no general optimal selection of parameters, and this selection depends on the circuit and the scenario of the application. Currently, most of the existing HE frameworks require cryptographers to address these considerations manually. It requires a minimum of expertise acquired through a steep learning curve. In this paper, we propose a unified solution for the aforementioned challenges. Concretely, we present an expert system combining Fuzzy Logic and Linear Programming. The Fuzzy Logic Modules receive a user selection of high-level priorities for the security, efficiency, and performance of the cryptosystem. Based on these preferences, the expert system generates a Linear Programming Model that obtains optimal combinations of parameters by considering those priorities while preserving a minimum level of security for the cryptosystem. We conduct an extended evaluation where we show that an expert system generates optimal parameter selections that maintain user preferences without undergoing the inherent complexity of analyzing the circuit

CERN Document Server

Towards realistic privacy-preserving deep learning over encrypted medical data

Author: Cabrero-Holgueras José
Pastrana Sergio
Publication venue
Publication date: 01/01/2023
Field of study

Cardiovascular disease supposes a substantial fraction of healthcare systems. The invisible nature of these pathologies demands solutions that enable remote monitoring and tracking. Deep Learning (DL) has arisen as a solution in many fields, and in healthcare, multiple successful applications exist for image enhancement and health outside hospitals. However, the computational requirements and the need for large-scale datasets limit DL. Thus, we often offload computation onto server infrastructure, and various Machine-Learning-as-a-Service (MLaaS) platforms emerged from this need. These enable the conduction of heavy computations in a cloud infrastructure, usually equipped with high-performance computing servers. Unfortunately, the technical barriers persist in healthcare ecosystems since sending sensitive data (e.g., medical records or personally identifiable information) to third-party servers involves privacy and security concerns with legal and ethical implications. In the scope of Deep Learning for Healthcare to improve cardiovascular health, Homomorphic Encryption (HE) is a promising tool to enable secure, private, and legal health outside hospitals. Homomorphic Encryption allows for privacy-preserving computations over encrypted data, thus preserving the privacy of the processed information. Efficient HE requires structural optimizations to perform the complex computation of the internal layers. One such optimization is Packed Homomorphic Encryption (PHE), which encodes multiple elements on a single ciphertext, allowing for efficient Single Instruction over Multiple Data (SIMD) operations. However, using PHE in DL circuits is not straightforward, and it demands new algorithms and data encoding, which existing literature has not adequately addressed. To fill this gap, in this work, we elaborate on novel algorithms to adapt the linear algebra operations of DL layers to PHE. Concretely, we focus on Convolutional Neural Networks. We provide detailed descriptions and insights into the different algorithms and efficient inter-layer data format conversion mechanisms. We formally analyze the complexity of the algorithms in terms of performance metrics and provide guidelines and recommendations for adapting architectures that deal with private data. Furthermore, we confirm the theoretical analysis with practical experimentation. Among other conclusions, we prove that our new algorithms speed up the processing of convolutional layers compared to the existing proposals

CERN Document Server